AITopics | efficient evaluation

Exhaustively evaluating many large language models (LLMs) on a large suite of benchmarks is expensive. We cast benchmarking as finite-population inference and, under a fixed query budget, seek tight confidence intervals (CIs) for model accuracy with valid frequentist coverage. We propose Factorized Active Querying (FAQ), which (a) leverages historical information through a Bayesian factor model; (b) adaptively selects questions using a hybrid variance-reduction/active-learning sampling policy; and (c) maintains validity through Proactive Active Inference -- a finite-population extension of active inference (Zrnic & Candès, 2024) that enables direct question selection while preserving coverage. With negligible overhead cost, FAQ delivers up to $5\times$ effective sample size gains over strong baselines on two benchmark suites, across varying historical-data missingness levels: this means that it matches the CI width of uniform sampling while using up to $5\times$ fewer queries. We release our source code and our curated datasets to support reproducible evaluation and future research.

efficient evaluation, large language model, machine learning, (19 more...)

arXiv.org Machine Learning

2601.20251

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Neural Information Processing SystemsSep-30-2025, 08:35:01 GMT

We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. These models deliver impressive accuracy, but each image evaluation requires millions of floating point operations, making their deployment on smartphones and Internet-scale clusters problematic. The computation is dominated by the convolution operations in the lower layers of the model. We exploit the redundancy present within the convolutional filters to derive approximations that significantly reduce the required computation. Using large state-of-the-art models, we demonstrate speedups of convolutional layers on both CPU and GPU by a factor of 2, while keeping the accuracy within 1% of the original model.

convolutional network, exploiting linear structure, name change, (6 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.62)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Efficient Evaluation of Quantization-Effects in Neural Codecs

Mack, Wolfgang, Mustafa, Ahmed, Łaganowski, Rafał, Hijazy, Samer

arXiv.org Artificial IntelligenceFeb-7-2025

Neural codecs, comprising an encoder, quantizer, and decoder, enable signal transmission at exceptionally low bitrates. Training these systems requires techniques like the straight-through estimator, soft-to-hard annealing, or statistical quantizer emulation to allow a non-zero gradient across the quantizer. Evaluating the effect of quantization in neural codecs, like the influence of gradient passing techniques on the whole system, is often costly and time-consuming due to training demands and the lack of affordable and reliable metrics. This paper proposes an efficient evaluation framework for neural codecs using simulated data with a defined number of bits and low-complexity neural encoders/decoders to emulate the non-linear behavior in larger networks. Our system is highly efficient in terms of training time and computational and hardware requirements, allowing us to uncover distinct behaviors in neural codecs. We propose a modification to stabilize training with the straight-through estimator based on our findings. We validate our findings against an internal neural audio codec and against the state-of-the-art descript-audio-codec.

artificial intelligence, machine learning, neural codec, (14 more...)

arXiv.org Artificial Intelligence

2502.0477

Country:

Europe > Serbia > Southern and Eastern Serbia > Pčinja District > Vranje (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report > New Finding (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Neural Information Processing SystemsJan-17-2025, 14:17:10 GMT

We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. These models deliver impressive accuracy, but each image evaluation requires millions of floating point operations, making their deployment on smartphones and Internet-scale clusters problematic. The computation is dominated by the convolution operations in the lower layers of the model. We exploit the redundancy present within the convolutional filters to derive approximations that significantly reduce the required computation. Using large state-of-the-art models, we demonstrate speedups of convolutional layers on both CPU and GPU by a factor of 2, while keeping the accuracy within 1% of the original model.

convolutional network, efficient evaluation, exploiting linear structure, (3 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.48)

Add feedback

AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE

Ao, Wei, Boddeti, Vishnu Naresh

arXiv.org Artificial IntelligenceOct-11-2023

Secure inference of deep convolutional neural networks (CNNs) under RNS-CKKS involves polynomial approximation of unsupported non-linear activation functions. However, existing approaches have three main limitations: 1) Inflexibility: The polynomial approximation and associated homomorphic evaluation architecture are customized manually for each CNN architecture and do not generalize to other networks. 2) Suboptimal Approximation: Each activation function is approximated instead of the function represented by the CNN. 3) Restricted Design: Either high-degree or low-degree polynomial approximations are used. The former retains high accuracy but slows down inference due to bootstrapping operations, while the latter accelerates ciphertext inference but compromises accuracy. To address these limitations, we present AutoFHE, which automatically adapts standard CNNs for secure inference under RNS-CKKS. The key idea is to adopt layerwise mixed-degree polynomial activation functions, which are optimized jointly with the homomorphic evaluation architecture in terms of the placement of bootstrapping operations. The problem is modeled within a multi-objective optimization framework to maximize accuracy and minimize the number of bootstrapping operations. AutoFHE can be applied flexibly on any CNN architecture, and it provides diverse solutions that span the trade-off between accuracy and latency. Experimental evaluation over RNS-CKKS encrypted CIFAR datasets shows that AutoFHE accelerates secure inference by $1.32\times$ to $1.8\times$ compared to methods employing high-degree polynomials. It also improves accuracy by up to 2.56% compared to methods using low-degree polynomials. Lastly, AutoFHE accelerates inference and improves accuracy by $103\times$ and 3.46%, respectively, compared to CNNs under TFHE.

autofhe, automated adaption, efficient evaluation, (1 more...)

arXiv.org Artificial Intelligence

2310.08012

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Note on the Efficient Evaluation of PAC-Bayes Bounds

Biggs, Felix

arXiv.org Artificial IntelligenceOct-20-2022

When utilising PAC-Bayes theory for risk certification, it is usually necessary to estimate and bound the Gibbs risk of the PAC-Bayes posterior. Many works in the literature employ a method for this which requires a large number of passes of the dataset, incurring high computational cost. This manuscript presents a very general alternative which makes computational savings on the order of the dataset size.

artificial intelligence, machine learning, theorem 2, (15 more...)

arXiv.org Artificial Intelligence

2209.05188

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.05)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Denton, Emily L., Zaremba, Wojciech, Bruna, Joan, LeCun, Yann, Fergus, Rob

Neural Information Processing SystemsFeb-14-2020, 07:26:56 GMT

We present techniques for speeding up the test-time evaluation of large convolutional networks, designed for object recognition tasks. These models deliver impressive accuracy, but each image evaluation requires millions of floating point operations, making their deployment on smartphones and Internet-scale clusters problematic. The computation is dominated by the convolution operations in the lower layers of the model. We exploit the redundancy present within the convolutional filters to derive approximations that significantly reduce the required computation. Using large state-of-the-art models, we demonstrate speedups of convolutional layers on both CPU and GPU by a factor of 2, while keeping the accuracy within 1% of the original model.

convolutional network, efficient evaluation, exploiting linear structure, (3 more...)

Neural Information Processing Systems

Genre: Research Report > Promising Solution (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.56)

Add feedback

Filters

Collaborating Authors

efficient evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Efficient Evaluation of LLM Performance with Statistical Guarantees

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

Efficient Evaluation of Quantization-Effects in Neural Codecs

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation

AutoFHE: Automated Adaption of CNNs for Efficient Evaluation over FHE

A Note on the Efficient Evaluation of PAC-Bayes Bounds

Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation